Skip to content

Conversation

@chienyuanchang
Copy link
Collaborator

Purpose

  • ...

Does this introduce a breaking change?

[ ] Yes
[ ] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

  • Get the code
git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install
  • Test the code

What to Check

Verify that the following are valid

  • ...

Other Information

elif value_type == "number":
try:
di_label["valueNumber"] = float(value.get("content")) # content can be easily converted to a float
content_val = value.get("content")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aainav269, I encountered some errors when I tried to convert fields labeled by region in DI studio which would not have content. I'm wondering if we encountered this error before and if we are good to set value as None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember seeing these region fields before, but only in DI 3.1. I think we decided to just ignore these region fields when converting to CU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I need to set the value to None to avoid the errors.


# imports from same project
from constants import CU_API_VERSION, MAX_FIELD_LENGTH, VALID_CU_FIELD_TYPES
from constants import CU_API_VERSION, MAX_FIELD_LENGTH, VALID_CU_FIELD_TYPES, COMPLETION_MODEL, EMBEDDING_MODEL
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aainav269, I found we only validate the length of field name and do not check/normalize the field name by our current field limitation. It seems like we also don't check/remove the field format. Do you recall the discussion of field name normalization in this tool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we decided then to remove the fields that exceed the field name length. One point of discussion was if we shorten the field name, could there be another field with that name? Ex: if we have ...._Yes and ...._No and we shorten both, it would be ....

I don't think we ever validated the field format. I think we assumed that if the field was already generated by DI, the format would apply to CU as well. What are you thinking of enforcing for this?

Copy link
Collaborator Author

@chienyuanchang chienyuanchang Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CU has more limitations on field name than DI like no white spaces and only underscores and no other symbols. If we didn't ignore this intentionally. I will add some logics to do the validation and modification.


# Set the global variables
api_version = os.getenv("API_VERSION")
api_version = os.getenv("API_VERSION") or CU_API_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the os environment doesn't match the api-version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will try to get API_VERSION in the env first. If it's none, the program will use default CU_API_VERSION. We can also consider always using the latest default api version instead of letting the user input it.

"Apim-Subscription-id": f"{subscription_key}",
"Content-Type": "application/pdf",
"Ocp-Apim-Subscription-Key": f"{subscription_key}",
"Content-Type": "application/octet-stream",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we changing the content type while keeping the content the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the same content you mentioned. We may have image files. Do we only support pdf files in this tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants